{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "toc": "true"
   },
   "source": [
    "# Table of Contents\n",
    "<div class=\"lev1 toc-item\"><a href=\"#Exercise-3.5.--Try-out-gradient-descent\" data-toc-modified-id=\"Exercise-3.5.--Try-out-gradient-descent-5\"><span class=\"toc-item-num\">5&nbsp;&nbsp;</span>Exercise 3.5.  Try out gradient descent</a></div><div class=\"lev1 toc-item\"><a href=\"#Exercise-3.6.-Compare-fixed-and-diminishing-steplengths-for-a-simple-example\" data-toc-modified-id=\"Exercise-3.6.-Compare-fixed-and-diminishing-steplengths-for-a-simple-example-6\"><span class=\"toc-item-num\">6&nbsp;&nbsp;</span>Exercise 3.6. Compare fixed and diminishing steplengths for a simple example</a></div><div class=\"lev1 toc-item\"><a href=\"#Exercise-3.9.-Code-up-momentum-accelerated-gradient-descent\" data-toc-modified-id=\"Exercise-3.9.-Code-up-momentum-accelerated-gradient-descent-9\"><span class=\"toc-item-num\">9&nbsp;&nbsp;</span>Exercise 3.9. Code up momentum-accelerated gradient descent</a></div>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import basic libraries and autograd wrapped numpy\n",
    "import autograd.numpy as np\n",
    "import copy\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "\n",
    "# this is needed to compensate for matplotlib notebook's tendancy to blow up images when plotted inline\n",
    "%matplotlib notebook\n",
    "from matplotlib import rcParams\n",
    "rcParams['figure.autolayout'] = True"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 3.5.  Try out gradient descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this exercise you will implement gradient descent using the hand-computed derivative.\n",
    "\n",
    "$$\\frac{\\partial}{\\partial w}g(w) = \\frac{1}{50}\\left(4w^3 + 2w + 10 \\right)$$\n",
    "\n",
    "A skeleton of the desired algorithm is in the cell below.  All parts marked \"TO DO\" are for you to construct."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# gradient descent function - inputs: g (input function), alpha (steplength parameter), max_its (maximum number of iterations), w (initialization)\n",
    "def gradient_descent(alpha,max_its,w):\n",
    "    # cost for this example\n",
    "    g = lambda w: 1/50*(w**4 + w**2 + 10*w)\n",
    "    \n",
    "    # the gradient function for this example\n",
    "    grad = lambda w: 1/50*(4*w**3 + 2*w + 10)\n",
    "\n",
    "    # run the gradient descent loop\n",
    "    cost_history = [g(w)]        # container for corresponding cost function history\n",
    "    for k in range(1,max_its+1):       \n",
    "        # evaluate the gradient, store current weights and cost function value\n",
    "        ## TO DO\n",
    "\n",
    "        # take gradient descent step\n",
    "        ## TO DO\n",
    "            \n",
    "        # collect final weights\n",
    "        cost_history.append(g(w))  \n",
    "    return cost_history"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# initial point\n",
    "w = 2.0\n",
    "max_its = 1000\n",
    "\n",
    "# produce gradient descent runs\n",
    "alpha = 10**(0)\n",
    "cost_history_1 = gradient_descent(alpha,max_its,w)\n",
    "\n",
    "alpha = 10**(-1)\n",
    "cost_history_2 = gradient_descent(alpha,max_its,w)\n",
    "\n",
    "alpha = 10**(-2)\n",
    "cost_history_3 = gradient_descent(alpha,max_its,w)\n",
    "\n",
    "# plot cost function histories\n",
    "## TO DO"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 3.6. Compare fixed and diminishing steplengths for a simple example"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "In this exercise you will compare a fixed steplength scheme and a the diminishing steplength rule to minimize the function\n",
    "\n",
    "\\begin{equation}\n",
    "g(w) = \\left \\vert w \\right \\vert.\n",
    "\\end{equation}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Notice that this function has a single global minimum at $w = 0$ and a derivative defined (everywhere but at $w = 0$)\n",
    "\n",
    "\\begin{equation}\n",
    "\\frac{\\mathrm{d}}{\\mathrm{d}w}g(w) = \\begin{cases}\n",
    "+1 \\,\\,\\,\\,\\,\\text{if} \\,\\, w > 0 \\\\\n",
    "-1 \\,\\,\\,\\,\\,\\text{if} \\,\\, w < 0.\n",
    "\\end{cases}\n",
    "\\end{equation}\n",
    "\n",
    "which makes the use of any fixed steplength scheme problematic for gradient descent.  \n",
    "\n",
    "Below you will make two runs of $20$ steps of gradient descent each initialized at the point $w^0 = 2$, the first with a fixed steplength rule of $\\alpha = 0.5$ (left panel) for each and every step, and the second using the diminishing steplength rule $\\alpha = \\frac{1}{k}$ (right panel).\n",
    "\n",
    "A skeleton of the desired algorithm is in the cell below.  All parts marked \"TO DO\" are for you to construct.  Note here you will use `autograd` to construct the gradient function for $g$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# import automatic differentiator to compute gradient module\n",
    "from autograd import grad \n",
    "\n",
    "# gradient descent function - inputs: g (input function), alpha (steplength parameter), max_its (maximum number of iterations), w (initialization)\n",
    "def gradient_descent(g,alpha,max_its,w):\n",
    "    # compute gradient module using autograd\n",
    "    gradient = grad(g)\n",
    "\n",
    "    # run the gradient descent loop\n",
    "    weight_history = [w]           # container for weight history\n",
    "    cost_history = [g(w)]          # container for corresponding cost function history\n",
    "    for k in range(max_its):\n",
    "        # evaluate the gradient, store current weights and cost function value\n",
    "        ## TO DO\n",
    "\n",
    "        # take gradient descent step\n",
    "        ## TO DO\n",
    "        \n",
    "        # record weight and cost\n",
    "        weight_history.append(w)\n",
    "        cost_history.append(g(w))\n",
    "    return weight_history,cost_history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Compute the cost function history associated with each desired run and plot both to compare."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Exercise 3.9. Code up momentum-accelerated gradient descent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A skeleton of the desired algorithm is in the cell below.  All parts marked \"TO DO\" are for you to construct"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "from autograd import numpy as np\n",
    "from autograd import value_and_grad \n",
    "\n",
    "# gradient descent function - inputs: g (input function), alpha (steplength parameter), max_its (maximum number of iterations), w (initialization)\n",
    "def momentum(g,alpha,beta,max_its,w):\n",
    "    # compute the gradient function of our input function - note this is a function too\n",
    "    # that - when evaluated - returns both the gradient and function evaluations (remember\n",
    "    # as discussed in Chapter 3 we always ge the function evaluation 'for free' when we use\n",
    "    # an Automatic Differntiator to evaluate the gradient)\n",
    "    gradient = value_and_grad(g)\n",
    "\n",
    "    # run the gradient descent loop\n",
    "    weight_history = []      # container for weight history\n",
    "    cost_history = []        # container for corresponding cost function history\n",
    "    alpha = 0\n",
    "    cost_eval,grad_eval = gradient(w)\n",
    "    \n",
    "    # initialization for momentum direction\n",
    "    h = np.zeros((w.shape))\n",
    "    for k in range(1,max_its+1):        \n",
    "        # evaluate the gradient, store current weights and cost function value\n",
    "        cost_eval,grad_eval = gradient(w)\n",
    "        weight_history.append(w)\n",
    "        cost_history.append(cost_eval)\n",
    "        \n",
    "        #### momentum step - update exponential average of gradient directions to ameliorate zig-zagging ###\n",
    "        ## TODO \n",
    "\n",
    "        # take gradient descent step\n",
    "        w = w + alpha*h\n",
    "            \n",
    "    # collect final weights\n",
    "    weight_history.append(w)\n",
    "    # compute final cost function value via g itself (since we aren't computing \n",
    "    # the gradient at the final step we don't get the final cost function value \n",
    "    # via the Automatic Differentiatoor) \n",
    "    cost_history.append(g(w))  \n",
    "    return weight_history,cost_history"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run momentum gradient descent to minimize the function described in the text.  Below a skeleton of the desired code is provided.  You will need to plot the cost function histories associated with each run to produce the final comparison."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# define constants for a N=2 input quadratic\n",
    "a1 = 0\n",
    "b1 = 0*np.ones((2,1))\n",
    "C1 = np.array([[0.5,0],[0,9.75]])\n",
    "\n",
    "# a quadratic function defined using the constants above\n",
    "g = lambda w: (a1 + np.dot(b1.T,w) + np.dot(np.dot(w.T,C1),w))[0]\n",
    "\n",
    "w = np.array([10.0,1.0]); max_its = 25; alpha_choice = 10**(-1);\n",
    "beta = 0\n",
    "weight_history_1,cost_history_1 = momentum(g,alpha_choice,beta,max_its,w)\n",
    "\n",
    "beta = 0.1;\n",
    "weight_history_2,cost_history_2 = momentum(g,alpha_choice,beta,max_its,w)\n",
    "\n",
    "beta = 0.7\n",
    "weight_history_3,cost_history_3 = momentum(g,alpha_choice,beta,max_its,w)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$f$ has Lipschitz continuous gradient with constant $J$, and $g$\n",
    "is Lipschitz continuous with constant $K$, so we can write for all\n",
    "$\\mathbf{x}$ and $\\mathbf{y}$ in the domain of $g$\\noindent \n",
    "\\begin{equation}\n",
    "\\left\\Vert \\nabla f\\left(g\\left(\\mathbf{x}\\right)\\right)-\\nabla f\\left(g\\left(\\mathbf{y}\\right)\\right)\\right\\Vert _{2}\\leq J\\left\\Vert g\\left(\\mathbf{x}\\right)-g\\left(\\mathbf{y}\\right)\\right\\Vert _{2}\\leq JK\\left\\Vert \\mathbf{x}-\\mathbf{y}\\right\\Vert _{2}.\n",
    "\\end{equation}\n",
    "\n",
    "Therefore, $f\\left(g\\right)$ has Lipschitz continuous gradient with\n",
    "constant $JK$. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  },
  "toc": {
   "colors": {
    "hover_highlight": "#DAA520",
    "navigate_num": "#000000",
    "navigate_text": "#333333",
    "running_highlight": "#FF0000",
    "selected_highlight": "#FFD700",
    "sidebar_border": "#EEEEEE",
    "wrapper_background": "#FFFFFF"
   },
   "moveMenuLeft": true,
   "nav_menu": {
    "height": "327.633px",
    "width": "252px"
   },
   "navigate_menu": true,
   "number_sections": false,
   "sideBar": true,
   "threshold": 4,
   "toc_cell": true,
   "toc_section_display": "block",
   "toc_window_display": false,
   "widenNotebook": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
